Legal case document similarity: You need both network and text
نویسندگان
چکیده
Estimating the similarity between two legal case documents is an important and challenging problem, having various downstream applications such as prior-case retrieval citation recommendation. There are broad approaches for task — network-based text-based. Prior consider citations only to prior-cases (also called precedents) (PCNet). This approach misses signals inherent in Statutes (written laws of a jurisdiction). In this work, we propose Hier-SPCNet that augments PCNet with heterogeneous network Statutes. We incorporate domain knowledge document into Hier-SPCNet, thereby obtaining state-of-the-art results similarity. Both textual provide similarity; but till now, trivial attempts have been made unify signals. apply several methods combining information estimating perform extensive experiments over from Indian judiciary, where gold standard document-pairs judged by law experts reputed Law institutes India. Our establish our proposed significantly improve correlation experts’ opinion when compared existing best-performing combination method (that combines text-based similarity) improves 11.8% best 20.6% method. also can be used recommend/retrieve citable similar cases source (query) case, which well appreciated experts. • practically useful task. substantially demonstrate utility recommending documents.
منابع مشابه
Similarity Measures for Text Document Clustering
Clustering is a useful technique that organizes a large quantity of unordered text documents into a small number of meaningful and coherent clusters, thereby providing a basis for intuitive and informative navigation and browsing mechanisms. Partitional clustering algorithms have been recognized to be more suitable as opposed to the hierarchical clustering schemes for processing large datasets....
متن کاملAutomatic Term Extraction and Document Similarity in Special Text Corpora
This paper confirms that the performance of a state-of-the-art automatic term extraction method on a computer science corpus is similar to previously published performance data on a medical corpus. The extracted terms are then used to estimate the similarity of papers in the computer science corpus using the standard Vector Space Model. The precision of retrieval using a term-based representati...
متن کاملA Text Similarity Approach for Precedence Retrieval from Legal Documents
Precedence retrieval of legal documents is an information retrieval task to retrieve prior case documents that are related to a given case document. This helps in automatic linking of related documents to ensure that identical situations are treated similarly in every case. Several methodologies, such as information extraction based on natural language processing, rule-based method, and machine...
متن کاملVertical Bar Detection for Gauging Text Similarity of Document Images
A new method for gauging text similarity of image-based document using word shape recognition is proposed in this paper. Image features are directly extracted instead of using OCR (optical character recognition). The proposed method forms so-called vertical bar patterns by detecting local extrema points in word units extracted by segmenting the document images. These vertical bar patterns form ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Information Processing and Management
سال: 2022
ISSN: ['0306-4573', '1873-5371']
DOI: https://doi.org/10.1016/j.ipm.2022.103069